Skip to content

[LTX2] better timing and profiling capabilities#389

Open
mbohlool wants to merge 1 commit intomainfrom
mehdy_perf
Open

[LTX2] better timing and profiling capabilities#389
mbohlool wants to merge 1 commit intomainfrom
mehdy_perf

Conversation

@mbohlool
Copy link
Copy Markdown
Collaborator

@mbohlool mbohlool commented Apr 23, 2026

Description:
This PR introduces better timing and profiling capabilities to the LTX2 generation pipeline to help identify performance bottlenecks.

Key Changes:

Detailed Timing: Added time.perf_counter() blocks and jax.block_until_ready() calls across the pipeline to accurately measure text encoding, connector passes, denoising steps, VAE decoding, and post-processing.

Multi-Pass Execution: Updated generate_ltx2.py to support a three-stage execution flow:

Warmup Pass: For JIT compilation.

Generation Pass: For actual output and standard timing.

Profiling Pass: (Optional) Captured via max_utils.Profiler for a subset of steps.

Enhanced Logging: Added a summary table for Load, Compile, and Inference times.

e.g.

==================================================
  TIMING SUMMARY
==================================================
  Load (checkpoint):     106.7s
  Compile:                81.5s
  ────────────────────────────────────────
  Inference:              16.4s
    Text Encoding:         3.6s
    Preparation:           0.0s
    Connectors:            0.0s
    Denoising:             7.2s
    Latent Upsampler:      0.0s
    Latent Processing:     0.6s
    Video VAE:             1.5s
    Video Post:            3.2s
    Audio VAE:             0.1s
    Vocoder:               0.2s
==================================================

Config Updates: Added skip_first_n_steps_for_profiler and profiler_steps to the LTX2 configuration.

Memory Management: Explicitly deletes large tensors (out, videos, audios) before the profiling run to prevent OOM.

@mbohlool mbohlool requested a review from entrpn as a code owner April 23, 2026 00:20
@github-actions
Copy link
Copy Markdown

@mbohlool mbohlool force-pushed the mehdy_perf branch 2 times, most recently from c2eae2f to 6bd35bf Compare April 23, 2026 00:51
@Perseus14
Copy link
Copy Markdown
Collaborator

@mbohlool Could you add a table with the latency gain (single video and amortized throughput) of this change with the baseline (main)?

Thanks!

@mbohlool mbohlool force-pushed the mehdy_perf branch 2 times, most recently from caaef98 to 6942969 Compare May 1, 2026 22:20
@mbohlool
Copy link
Copy Markdown
Collaborator Author

mbohlool commented May 1, 2026

@Perseus14 change the PR to focus only on the timing and profiling part. I explored the performance tweaking later. PTAL.

Comment thread src/maxdiffusion/pipelines/ltx2/ltx2_pipeline.py Outdated
Comment thread src/maxdiffusion/generate_ltx2.py
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 2, 2026

🤖 I'm sorry @Perseus14, but I was unable to process your request. Please see the logs for more details.

@mbohlool mbohlool changed the title perf: optimize LTX2 inference latency and implement granular TPU profiling [LTX2] better timing and profiling capabilities May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants